Greedy Attribute Selection

نویسندگان

  • Rich Caruana
  • Dayne Freitag
چکیده

Many real-world domains bless us with a wealth of attributes to use for learning. This blessing is often a curse: most inductive methods generalize worse given too many attributes than if given a good subset of those attributes. We examine this problem for two learning tasks taken from a calendar scheduling domain. We show that ID3/C4.5 generalizes poorly on these tasks if allowed to use all available attributes. We examine five greedy hillclimbing procedures that search for attribute sets that generalize well with ID3/C4.5. Experiments suggest hillclimbing in attribute space can yield substantial improvements in generalization performance. We present a caching scheme that makes attribute hillclimbing more practical computationally. We also compare the results of hillclimbing in attribute space with FOCUS and RELIEF on the two tasks.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

USP-EACH: Improved Frequency-based Greedy Attribute Selection

We present a follow-up of our previous frequency-based greedy attribute selection strategy. The current version takes into account also the instructions given to the participants of TUNA trials regarding the use of location information, showing an overall improvement on string-edit distance values driven by the results on the Furniture domain.

متن کامل

USP-EACH Frequency-based Greedy Attribute Selection for Referring Expressions Generation

Both greedy and domain-oriented REG algorithms have significant strengths but tend to perform poorly according to humanlikeness criteria as measured by, e.g., Dice scores. In this work we describe an attempt to combine both perspectives into a single attribute selection strategy to be used as part of the Dale & Reiter Incremental algorithm in the REG Challenge 2008, and the results in both Furn...

متن کامل

A Greedy Correlation Measure Based Attribute Clustering Algorithm for Gene Selection

This paper proposes an attribute clustering algorithm for grouping attributes into clusters so as to obtain meaningful modes from microarray data. First the problem of attribute clustering is analyzed and neighborhood mutual information is introduced to solve it. Furthermore, an attribute clustering algorithm is presented for grouping attributes into clusters through optimizing a criterion func...

متن کامل

Consistency-preserving attribute reduction in fuzzy rough set framework

Attribute reduction (feature selection) has become an important challenge in areas of pattern recognition, machine learning, data mining and knowledge discovery. Based on attribute reduction, one can extract fuzzy decision rules from a fuzzy decision table. As consistency is one of several criteria for evaluating the decision performance of a decision-rule set, in this paper, we devote to prese...

متن کامل

Submodular Attribute Selection for Action Recognition in Video

In real-world action recognition problems, low-level features cannot adequately characterize the rich spatial-temporal structures in action videos. In this work, we encode actions based on attributes that describes actions as high-level concepts e.g., jump forward or motion in the air. We base our analysis on two types of action attributes. One type of action attributes is generated by humans. ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1994